29 research outputs found

    Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes

    Get PDF
    Background: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast). Results: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. Conclusions: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70% to 95% of total phosphoproteins, but only 40% to 60% of total p-sites

    The challenges of interpreting phosphoproteomics data : a critical view through the bioinformatics lens

    No full text
    During the last decade, there has been great progress in high-throughput (HTP) phosphoproteomics and hundreds or even thousands of phosphorylation sites (p-sites) can now be detected in a single experiment. This success is attributable to a combination of very sensitive Mass Spectrometry instruments, better phosphopeptide enrichment techniques and bioinformatics software that are capable of detecting peptides and localizing p-sites. These new technologies have opened up a whole new level of gene regulation to be studied, with great potential for therapeutics and synthetic biology. Nevertheless, many challenges remain to be resolved; these concern the biases and noise of these proteomic technologies, the biological noise that is present, as well as the incompleteness of the current datasets. Despite these problems, the datasets published so far appear to represent a good sample of a complete phosphoproteome of some organisms and are capable of revealing their major properties

    Choose your partners: dimerization in eukaryotic transcription factors

    No full text
    In many eukaryotic transcription factor gene families, proteins require a physical interaction with an identical molecule or with another molecule within the same family to form a functional dimer and bind DNA. Depending on the choice of partner and the cellular context, each dimer triggers a sequence of regulatory events that lead to a particular cellular fate, for example, proliferation or differentiation. Recent syntheses of genomic and functional data reveal that partner choice is not random; instead, dimerization specificities, which are strongly linked to the evolution of the protein family, apply. Our focus is on understanding these interaction specificities, their functional consequences and how they evolved. This knowledge is essential for understanding gene regulation and designing a new generation of drugs

    Evolution and taxonomic distribution of nonribosomal peptide and polyketide synthases

    No full text
    The majority of nonribosomal peptide synthases and type I polyketide synthases are multimodular megasynthases of oligopeptide and polyketide secondary metabolites, respectively. Owing to their multimodular architecture, they synthesize their metabolites in assembly line logic. The ongoing genomic revolution together with the application of computational tools has provided the opportunity to mine the various genomes for these enzymes and identify those organisms that produce many oligopeptide and polyketide metabolites. In addition, scientists have started to comprehend the molecular mechanisms of megasynthase evolution, by duplication, recombination, point mutation and module skipping. This knowledge and computational analyses have been implemented towards predicting the specificity of these megasynthases and the structure of their end products. It is an exciting field, both for gaining deeper insight into their basic molecular mechanisms and exploiting them biotechnologically

    An exploration of alternative visualisations of the basic helix-loop-helix protein interaction network

    No full text
    Abstract Background Alternative representations of biochemical networks emphasise different aspects of the data and contribute to the understanding of complex biological systems. In this study we present a variety of automated methods for visualisation of a protein-protein interaction network, using the basic helix-loop-helix (bHLH) family of transcription factors as an example. Results Network representations that arrange nodes (proteins) according to either continuous or discrete information are investigated, revealing the existence of protein sub-families and the retention of interactions following gene duplication events. Methods of network visualisation in conjunction with a phylogenetic tree are presented, highlighting the evolutionary relationships between proteins, and clarifying the context of network hubs and interaction clusters. Finally, an optimisation technique is used to create a three-dimensional layout of the phylogenetic tree upon which the protein-protein interactions may be projected. Conclusion We show that by incorporating secondary genomic, functional or phylogenetic information into network visualisation, it is possible to move beyond simple layout algorithms based on network topology towards more biologically meaningful representations. These new visualisations can give structure to complex networks and will greatly help in interpreting their evolutionary origins and functional implications. Three open source software packages (InterView, TVi and OptiMage) implementing our methods are available.</p

    T-RECs:Rapid and large-scale detection of recombination events among different evolutionary lineages of viral genomes

    Get PDF
    Background: Many computational tools that detect recombination in viruses are not adapted for the ongoing genomic revolution. A computational tool is needed, that will rapidly scan hundreds/thousands of genomes or sequence fragments and detect candidate recombination events that may later be further analyzed with more sensitive and specialized methods. Results: T-RECs, a Windows based graphical tool, employs pairwise alignment of sliding windows and can perform (i) genotyping, (ii) clustering of new genomes, (iii) detect recent recombination events among different evolutionary lineages, (iv) manual inspection of detected recombination events by similarity plots and (v) annotation of genomic regions. Conclusions: T-RECs is very effective, as demonstrated by an analysis of 555 Norovirus complete genomes and 2500 sequence fragments, where a recombination hotspot was identified at the ORF1-ORF2 junction

    HPV16-Genotyper: A Computational Tool for Risk-Assessment, Lineage Genotyping and Recombination Detection in HPV16 Sequences, Based on a Large-Scale Evolutionary Analysis

    No full text
    Previous analyses have identified certain but limited evidence of recombination among HPV16 genomes, in accordance with a general perception that DNA viruses do not frequently recombine. In this evolutionary/bioinformatics study we have analyzed more than 3600 publicly available complete and partial HPV16 genomes. By studying the phylogenetic incongruence, similarity plots and the distribution patterns of lineage-specific SNPs, we identify several potential recombination events between the two major HPV16 evolutionary clades. These two clades comprise the (widely considered) phenotypically more benign (lower risk) lineage A and the (widely considered) phenotypically more aggressive (higher risk) non-European lineages B, C and D. We observe a frequency of potential recombinant sequences ranging between 0.3 and 1.2% which is low, but nevertheless considerable. Our findings have clinical implications and highlight that HPV16 genotyping and risk assessment based only on certain genomic regions and not the entire genome may provide a false genotype and, therefore, its associated risk estimate. Finally, based on this analysis, we have developed a bioinformatics tool that automates the entire process of HPV16 lineage genotyping, recombination detection and further identifies, within the submitted sequences, SNPs that have been reported in the literature to increase the risk of cancer
    corecore